Can AI Match Human Experts? Evaluating LLM-Generated Feedback on Resident Scholarly Projects
This study demonstrates that an open-weight LLM (LLaMA-3.1) can generate rubric-aligned formative feedback for resident scholarly projects that approaches expert human quality overall, particularly excelling in safety assessments and specific project types, though human evaluators generally maintain a slight edge in reasoning and trust.